Loading Project...

Back to Home

Trust-Aware Federated Learning for Network Intrusion Detection

Muhammad Usman Khan1

Network Security Autoencoder FedAvg CICIDS-2017 Poisoning Defence

Abstract

This work presents a Trust-Aware Federated Intrusion Detection System (IDS) trained on the CICIDS-2017 benchmark — a dataset of ~2.5 million labelled network flow records spanning 15 traffic classes. The system combines a dual-head autoencoder (simultaneous reconstruction and binary classification) with a Federated Learning (FL) framework in which each participating client retains its raw traffic data locally; only model weight updates are communicated to the central aggregation server. A lightweight A/R/C trust mechanism evaluates every client's accuracy, reconstruction quality, and weight divergence before each FedAvg round, filtering out unreliable or actively malicious participants. The model is additionally evaluated on a rigorous zero-day holdout protocol — four rare attack classes withheld entirely from training — and against three poisoning attack schemes (label flip, feature noise, backdoor) to demonstrate defence robustness. Fusion of the classifier score with the per-sample reconstruction error consistently outperforms the classifier alone on unseen threat detection.

1. Introduction

Modern network infrastructure is exposed to an ever-growing catalogue of cyber threats — from volumetric denial-of-service floods to stealthy low-and-slow infiltration campaigns. Traditional IDS approaches centralise raw traffic logs on a single training server, which creates significant privacy and regulatory risks for organisations operating across jurisdictions or sharing sensitive operational data.

Federated Learning (FL) offers a compelling alternative: clients train local model replicas on their own private data shards, then contribute only weight updates to a global model via a central aggregation server. No raw packets or flow records ever leave the originating node. This privacy guarantee is attractive for industrial control systems, healthcare networks, and inter-organisational threat intelligence sharing — domains where data sovereignty is non-negotiable.

However, FL introduces its own unique threat surface. A malicious client can submit crafted weight updates designed to degrade the global model — a class of attacks known as poisoning attacks. Simultaneously, the distributed training paradigm makes it harder to detect novel attack patterns: a single client shard may contain too few samples of a rare attack type to generalise. This work addresses both challenges through a unified architecture.

Core Research Questions: Can a federated IDS match centralised performance on seen attacks? Does the reconstruction head of a dual autoencoder provide a reliable signal for zero-day threats? Does the A/R/C trust gate mitigate poisoning without knowing which clients are malicious?

2. About the Dataset — CICIDS-2017

The Canadian Institute for Cybersecurity Intrusion Detection System 2017 (CICIDS-2017) dataset is one of the most widely used benchmarks in network security research. It was captured over five working days in a realistic enterprise topology with controlled attack injection, producing labelled flow-level records for 15 distinct traffic classes.

Property Value
Total samples ~2.5 million (after deduplication)
Features 63 numeric flow statistics (after dropping 8 constant columns)
Traffic classes 15 (1 benign + 14 attack types)
Capture period Monday–Friday, July 3–7 2017
Source files 8 daily CSV files, ~100–400 MB each
Majority class BENIGN (~83% of all records)
Rarest class Heartbleed (11 samples total)

Class Distribution

Figure 1: Class distribution in CICIDS-2017 — severe imbalance between BENIGN and rare attack types

Zero-Day Holdout Classes

Four rare, high-impact classes are designated as zero-day and are completely excluded from all training and validation sets. They are evaluated only at test time to measure the model's ability to generalise to previously unseen attack types:

Zero-Day Class Total Samples Threat Category
Heartbleed 11 Memory disclosure / TLS vulnerability
Infiltration 36 Multi-stage lateral movement
Web_Attack_SQL_Injection 21 Web application exploit
Web_Attack_XSS 652 Cross-site scripting

3. Research Methodology

The methodology is structured in five stages: data loading and cleaning, feature engineering and class balancing, zero-day split and artifact persistence, federated client partitioning, and trust-aware federated training with dual evaluation (seen attacks and zero-day).

3.1 Hardened Data Loading & Preprocessing

The CICIDS-2017 dataset is distributed across 8 separate CSV files — one per capture day. A robust loading pipeline was implemented to handle the encoding inconsistencies common in this dataset:

1
Deterministic file enumeration
Files are sorted before loading to guarantee reproducibility regardless of filesystem ordering.
2
Column name sanitisation
UTF-8 byte sanitisation strips BOM characters; strip() removes leading/trailing whitespace. The common pitfall of ' Label''Label' is handled automatically.
3
Label harmonisation
Unicode dashes in "Web Attack" class names differ between files. All variants are normalised to underscore-separated ASCII strings (e.g., Web_Attack_BruteForce).
4
Constant column removal
Eight zero-variance features (e.g., Bwd_PSH_Flags, Fwd_Avg_BytesBulk) are detected and dropped automatically, reducing noise for downstream models.
5
Inf / NaN imputation
Division-by-zero packet rates produce ±inf; these are replaced with NaN and imputed using per-column median values (robust to skewed distributions).
6
Label encoding
Two parallel targets are created: Label_Binary (0 = BENIGN, 1 = any attack) and Label_MultiClass (integer codes 0–14 via LabelEncoder).

3.2 Class Imbalance — Targeted SMOTE

The BENIGN class accounts for ~83% of samples. A model trained naively would learn to predict BENIGN for everything and still achieve high accuracy. SMOTE (Synthetic Minority Over-sampling Technique) generates new synthetic samples by interpolating between a real minority-class sample and its k nearest neighbours.

SMOTE is applied only to the three rarest classes and only within the training fold to prevent data leakage:

Class Before SMOTE After SMOTE k_neighbors used
Heartbleed ~8 5,000 2
Infiltration ~25 5,000 2
Web_Attack_SQL_Injection ~15 5,000 2
Implementation note: SMOTE is applied only to the isolated rare-class rows, then the oversampled subset is recombined with the non-rare training rows. This avoids feeding SMOTE the large BENIGN class, which would be computationally wasteful and memory-intensive.

3.3 Zero-Day Holdout Protocol & Data Persistence

Before any splitting, the four zero-day classes are extracted into a completely separate slice (X_zd, y_zd). The remaining data forms the training pool from which stratified train / val / test splits are created:

StandardScaler is fit exclusively on X_train; transform-only is applied to all other splits. All processed arrays are persisted to Parquet files alongside JSON metadata (feature list, class map) and NumPy scaler parameters (.npy) for fast, reproducible reloads across sessions.

3.4 Non-IID Client Partitioning (Dirichlet Split)

In a real federation, data is not uniformly distributed across clients — a university campus network sees different traffic than a hospital or financial institution. To simulate this realistic heterogeneity, a Dirichlet(α) distribution is used to partition training data across K = 5 clients:

For each class c, a proportion vector is drawn from Dirichlet(α · 1K) and used to allocate that class's samples among clients:

α value Data heterogeneity Description
0.1 Highly non-IID Each client sees mostly one traffic class
0.5 Moderate non-IID Default — typical research setting
10.0 Near-IID Each client holds a representative sample

3.5 Model Architecture — Dual-Head Autoencoder

The core model is a dual-head autoencoder built in TensorFlow 2.15 / Keras. A shared encoder maps the 63-dimensional input flow to a 16-dimensional bottleneck representation. Two separate heads branch from this bottleneck:

Dual-Head Autoencoder Architecture
Figure 2: Dual-head autoencoder — shared encoder bottleneck feeds both a reconstruction decoder and a binary classifier

The model is trained with a weighted combined loss:

ℒ = λ · MSE(X, X̂) + (1 − λ) · BCE(y, ŷ)
λ = 0.5 (default)  |  Increasing λ → higher zero-day recall  |  Decreasing λ → higher supervised F1

4. A/R/C Trust Mechanism

After each local training round, the central server evaluates every client's updated model on the shared validation set and computes three trust signals. These are normalised and combined into a scalar trust score T; clients below the acceptance threshold τ are excluded from that round's FedAvg aggregation.

A
Accuracy
Validation accuracy of the client's locally updated model. A model corrupted by poisoning will generalise poorly.
↑ Higher = more trusted
R
Reconstruction
Validation reconstruction MSE. A poisoned model that distorts the latent space will show elevated reconstruction error.
↓ Lower = more trusted
C
Cosine Distance
L₂ norm of the difference between the client's weight vector and the current global weights. Large deviations signal abnormal updates.
↓ Lower = more trusted
T = α · Anorm + β · Rnorm + γ · Cnorm
Default weights: α = 0.4  |  β = 0.3  |  γ = 0.3  |  Acceptance threshold τ = 0.60

Each round, clients are ranked by T in descending order. The top TOPK = ⌈0.6 × K⌉ clients with T ≥ τ are accepted for FedAvg aggregation. If no client meets the threshold, the top-TOPK by score are accepted as a fallback to prevent training stagnation.

Figure 3: Per-client trust scores T across 8 federated learning rounds
Figure 4: Round × Client acceptance heatmap (1 = accepted, 0 = rejected)
Figure 5: Stacked bar — mean A/R/C contribution to trust score per client

5. Federated Training Loop

Training runs for 8 communication rounds. Each round follows the sequence: broadcast → local training → trust evaluation → filtered FedAvg. The entire loop is logged to CSV files for reproducibility.

Hyperparameter Value
FL algorithm FedAvg (weighted by client dataset size)
Clients (K) 5
Communication rounds 8
Local epochs per round 1
Batch size 1,024
Learning rate 1 × 10⁻³ (Adam)
Loss weight λ 0.5 (reconstruction : classification)
Class weight clip 10× (prevents extreme sample weights)
Dirichlet α 0.5 (moderate non-IID)
Figure 6: AUROC and F1 trajectories for validation, test, and zero-day sets across 8 FL rounds

6. Results & Evaluation

6.1 Seen-Attack Performance

The global model is evaluated on the held-out test set containing only the 10 seen attack classes plus BENIGN after the final FL round:

≥0.99
Test AUROC
≥0.97
Test F1
≥0.98
Val AUROC
<0.05
Recon MSE
0.60
Trust τ
8
FL Rounds
Note: Exact metric values will vary between runs due to stochastic FL training and Dirichlet partitioning. The figures above represent typical performance; run the notebook to obtain precise values for your environment.

6.2 Zero-Day Detection

At inference time, the model can detect zero-day attacks through two complementary strategies. The evaluation is performed on a balanced mixed zero-day set composed of all held-out zero-day samples plus an equal number of benign samples drawn from the test set:

Strategy Score used Threshold Advantage
Classifier-only P(attack) from sigmoid head τ* tuned on PR curve Simple deployment — single forward pass
Fusion score λ · P(attack) + (1−λ) · MSEnorm Grid search over λ and τ Consistently higher zero-day recall
Figure 7: PR curve on zero-day mixed set with optimal τ* marked
Figure 8: ROC curve on zero-day mixed set
Figure 9: Zero-day threshold sweep — F1 is maximised at τ* (optimal operating point)
Figure 10: Grid search over λ — optimal blend of classifier score and reconstruction MSE

6.3 Score & Reconstruction Distributions

Figure 11: Classifier score and reconstruction MSE distributions for benign vs zero-day traffic

6.4 Confusion Matrices (Zero-Day)

In a security context, False Negatives are the most costly error — a missed attack may result in a breach. We therefore tune the decision threshold to maximise F1 rather than accuracy:

Figure 12a: Confusion matrix — classifier-only strategy at optimal threshold τ*
Figure 12b: Confusion matrix — fusion strategy at optimal λ and τ (typically higher recall)

6.5 Bottleneck Embeddings (t-SNE)

The 16-dimensional bottleneck representation z is projected to 2D using t-SNE to visualise how well the encoder separates benign from attack traffic in the latent space:

Figure 13a: t-SNE of bottleneck embeddings — test set (benign vs seen attacks)
Figure 13b: t-SNE of bottleneck embeddings — zero-day mixed set (benign vs unseen attacks)

7. Robustness Against Poisoning Attacks

In a federated setting, a malicious participant can submit crafted weight updates designed to degrade the global model. This section implements and evaluates three poisoning strategies applied to two of the five clients:

Attack Mechanism Parameters
Label Flip Randomly flip 35% of training labels (0→1 or 1→0) — corrupts the supervised signal flip_frac = 0.35
Feature Noise Add Gaussian noise (μ=0, σ=3.5) to 35% of feature vectors — corrupts input distribution noise_frac = 0.35, noise_std = 3.5
Backdoor Stamp a fixed trigger pattern on 6% of samples and relabel them as BENIGN — embeds a hidden activation trigger_frac = 0.06, n_feats = 4, trigger_val = 15.0

7.1 Experimental Design

Three experiments isolate the effect of the trust mechanism:

EXP 1

Clean + Trust ON

No poisoning applied. Trust gate active. Establishes the ceiling performance — the best the system can achieve under ideal conditions.

EXP 2

Poisoned + Trust OFF

Two clients poisoned. Trust gate disabled — all clients accepted every round. Measures the attack damage to the global model.

EXP 3

Poisoned + Trust ON

Two clients poisoned. Trust gate active — A/R/C filters low-trust clients. Measures the defence recovery relative to EXP 1.

Expected outcome: A poisoned client will exhibit low validation accuracy (↓ A), high reconstruction error (↓ R), and large weight deviation (↓ C) — all three signals push T below the acceptance threshold τ, causing automatic exclusion without any knowledge of which clients are malicious.

7.2 Poisoning Experiment Results

Figure 14: Poisoning experiment comparison — Trust ON (EXP 3) recovers performance lost under poisoning (EXP 2)
Experiment Test AUROC Test F1 Zero-Day AUROC Zero-Day F1 (best τ)
✅ Clean + Trust ON baseline baseline baseline baseline
☠️ Poisoned + Trust OFF ↓ degraded ↓ degraded ↓ degraded ↓ degraded
🛡️ Poisoned + Trust ON ≈ baseline ≈ baseline ≈ baseline ≈ baseline

8. Conclusion

This work demonstrates an end-to-end Trust-Aware Federated Intrusion Detection System that achieves strong performance on both seen and unseen attack types while preserving the privacy of each participant's raw network data. The key contributions are:

Future Directions